Language/OS - Multiplatform Resource Library

home *** CD-ROM | disk | FTP | other *** search

/ Language/OS - Multiplatform Resource Library / LANGUAGE OS.iso / pcl / docs.lha / cmu-user / cmu-user.info-6 < prev next >

Wrap

Text File | 1992-08-06 | 50.0 KB | 1,192 lines

Info file: cmu-user.info, -*-Text-*- produced by latexinfo-format-buffer from file: cmu-user.tex File: cmu-user.info Node: Inline Expansion Recording, Prev: Inline Expansion, Up: Inline Expansion, Next: Semi-Inline Expansion Inline Expansion Recording -------------------------- Inline expansion requires that the source for the inline expanded function to be available when calls to the function are compiled. The compiler doesn't remember the inline expansion for every function, since that would take an excessive about of space. Instead, the programmer must tell the compiler to record the inline expansion before the definition of the inline expanded function is compiled. This is done by globally declaring the function inline before the function is defined, by using the `inline' and `extensions:maybe-inline' (?) declarations. In addition to recording the inline expansion of inline functions at the time the function is compiled, `compile-file' also puts the inline expansion in the output file. When the output file is loaded, the inline expansion is made available for subsequent compilations; there is no need to compile the definition again to record the inline expansion. If a function is declared inline, but no expansion is recorded, then the compiler will give an efficiency note like: Note: MYFUN is declared inline, but has no expansion. When you get this note, check that the `inline' declaration and the definition appear before the calls that are to be inline expanded. This note will also be given if the inline expansion for a `defun' could not be recorded because the `defun' was in a non-null lexical environment. File: cmu-user.info Node: Semi-Inline Expansion, Prev: Inline Expansion Recording, Up: Inline Expansion, Next: The Maybe-Inline Declaration Semi-Inline Expansion --------------------- Python supports SEMI-INLINE functions. Semi-inline expansion shares a single copy of a function across all the calls in a component by converting the inline expansion into a local function (*Note Local Call::.) This takes up less space when there are multiple calls, but also provides less opportunity for context dependent optimization. When there is only one call, the result is identical to normal inline expansion. Semi-inline expansion is done when the `space' optimization quality is `0', and the function has been declared `extensions:maybe-inline'. This mechanism of inline expansion combined with local call also allows recursive functions to be inline expanded. If a recursive function is declared `inline', calls will actually be compiled semi-inline. Although recursive functions are often so complex that there is little advantage to semi-inline expansion, it can still be useful in the same sort of cases where normal inline expansion is especially advantageous, i.e. functions where the calling context can help a lot. File: cmu-user.info Node: The Maybe-Inline Declaration, Prev: Semi-Inline Expansion, Up: Inline Expansion The Maybe-Inline Declaration ---------------------------- The `extensions:maybe-inline' declaration is a CMU Common Lisp extension. It is similar to `inline', but indicates that inline expansion may sometimes be desirable, rather than saying that inline expansion should almost always be done. When used in a global declaration, `extensions:maybe-inline' causes the expansion for the named functions to be recorded, but the functions aren't actually inline expanded unless `space' is `0' or the function is eventually (perhaps locally) declared `inline'. Use of the `extensions:maybe-inline' declaration followed by the `defun' is preferable to the standard idiom of: (proclaim '(inline myfun)) (defun myfun () ...) (proclaim '(notinline myfun)) ;;; Any calls to `myfun' here are not inline expanded. (defun somefun () (declare (inline myfun)) ;; ;; Calls to `myfun' here are inline expanded. ...) The problem with using `notinline' in this way is that in CMU Common Lisp it does more than just suppress inline expansion, it also forbids the compiler to use any knowledge of `myfun' until a later `inline' declaration overrides the `notinline'. This prevents compiler warnings about incorrect calls to the function, and also prevents block compilation. The `extensions:maybe-inline' declaration is used like this: (proclaim '(extensions:maybe-inline myfun)) (defun myfun () ...) ;;; Any calls to `myfun' here are not inline expanded. (defun somefun () (declare (inline myfun)) ;; ;; Calls to `myfun' here are inline expanded. ...) (defun someotherfun () (declare (optimize (space 0))) ;; ;; Calls to `myfun' here are expanded semi-inline. ...) In this example, the use of `extensions:maybe-inline' causes the expansion to be recorded when the `defun' for `somefun' is compiled, and doesn't waste space through doing inline expansion by default. Unlike `notinline', this declaration still allows the compiler to assume that the known definition really is the one that will be called when giving compiler warnings, and also allows the compiler to do semi-inline expansion when the policy is appropriate. When the goal is merely to control whether inline expansion is done by default, it is preferable to use `extensions:maybe-inline' rather than `notinline'. The `notinline' declaration should be reserved for those special occasions when a function may be redefined at run-time, so the compiler must be told that the obvious definition of a function is not necessarily the one that will be in effect at the time of the call. File: cmu-user.info Node: Object Representation, Prev: Inline Expansion, Up: Advanced Compiler Use and Efficiency Hints, Next: Numbers Object Representation ===================== A somewhat subtle aspect of writing efficient CMU Common Lisp programs is choosing the correct data structures so that the underlying objects can be implemented efficiently. This is partly because of the need for multiple representations for a given value (?), but is also due to the sheer number of object types that CMU Common Lisp has built in. The number of possible representations complicates the choice of a good representation because semantically similar objects may vary in their efficiency depending on how the program operates on them. * Menu: * Think Before You Use a List:: * Structures:: * Arrays:: * Vectors:: * Bit-Vectors:: * Hashtables:: File: cmu-user.info Node: Think Before You Use a List, Prev: Object Representation, Up: Object Representation, Next: Structures Think Before You Use a List --------------------------- Although Lisp's creator seemed to think that it was for LISt Processing, the astute observer may have noticed that the chapter on list manipulation makes up less that three percent of Common Lisp: the Language II. The language has grown since Lisp 1.5 -- new data types supersede lists for many purposes. File: cmu-user.info Node: Structures, Prev: Think Before You Use a List, Up: Object Representation, Next: Arrays Structures ---------- One of the best ways of building complex data structures is to define appropriate structure types using defstruct. In Python, access of structure slots is always at least as fast as list or vector access, and is usually faster. In comparison to a list representation of a tuple, structures also have a space advantage. Even if structures weren't more efficient than other representations, structure use would still be attractive because programs that use structures in appropriate ways are much more maintainable and robust than programs written using only lists. For example: (rplaca (caddr (cadddr x)) (caddr y)) could have been written using structures in this way: (setf (beverage-flavor (astronaut-beverage x)) (beverage-flavor y)) The second version is more maintainable because it is easier to understand what it is doing. It is more robust because structures accesses are type checked. An `astronaut' will never be confused with a `beverage', and the result of `beverage-flavor' is always a flavor. See sections *Note Structure Types:: and *Note The Freeze-Type Declaration:: for more information about structure types. *Note Type Inference:: for a number of examples that make clear the advantages of structure typing. Note that the structure definition should be compiled before any uses of its accessors or type predicate so that these function calls can be efficiently open-coded. File: cmu-user.info Node: Arrays, Prev: Structures, Up: Object Representation, Next: Vectors Arrays ------ Arrays are often the most efficient representation for collections of objects because: * Array representations are often the most compact. An array is always more compact than a list containing the same number of elements. * Arrays allow fast constant-time access. * Arrays are easily destructively modified, which can reduce consing. * Array element types can be specialized, which reduces both overall size and consing (?.) Access of arrays that are not of type `simple-array' is less efficient, so declarations are appropriate when an array is of a simple type like `simple-string' or `simple-bit-vector'. Arrays are almost always simple, but the compiler may not be able to prove simpleness at every use. The only way to get a non-simple array is to use the :displaced-to, :fill-pointer or `adjustable' arguments to `make-array'. If you don't use these hairy options, then arrays can always be declared to be simple. Because of the many specialized array types and the possibility of non-simple arrays, array access is much like generic arithmetic (?). In order for array accesses to be efficiently compiled, the element type and simpleness of the array must be known at compile time. If there is inadequate information, the compiler is forced to call a generic array access routine. You can detect inefficient array accesses by enabling efficiency notes, ?. File: cmu-user.info Node: Vectors, Prev: Arrays, Up: Object Representation, Next: Bit-Vectors Vectors ------- Vectors (one dimensional arrays) are particularly useful, since in addition to their obvious array-like applications, they are also well suited to representing sequences. In comparison to a list representation, vectors are faster to access and take up between two and sixty-four times less space (depending on the element type.) As with arbitrary arrays, the compiler needs to know that vectors are not complex, so you should use `simple-string' in preference to `string', etc. The only advantage that lists have over vectors for representing sequences is that it is easy to change the length of a list, add to it and remove items from it. Likely signs of archaic, slow lisp code are `nth' and `nthcdr'. If you are using these functions you should probably be using a vector. File: cmu-user.info Node: Bit-Vectors, Prev: Vectors, Up: Object Representation, Next: Hashtables Bit-Vectors ----------- Another thing that lists have been used for is set manipulation. In applications where there is a known, reasonably small universe of items bit-vectors can be used to improve performance. This is much less convenient than using lists, because instead of symbols, each element in the universe must be assigned a numeric index into the bit vector. Using a bit-vector will nearly always be faster, and can be tremendously faster if the number of elements in the set is not small. The logical operations on `simple-bit-vector's are efficient, since they operate on a word at a time. File: cmu-user.info Node: Hashtables, Prev: Bit-Vectors, Up: Object Representation Hashtables ---------- Hashtables are an efficient and general mechanism for maintaining associations such as the association between an object and its name. Although hashtables are usually the best way to maintain associations, efficiency and style considerations sometimes favor the use of an association list (a-list). `assoc' is fairly fast when the TEST argument is `eq' or `eql' and there are only a few elements, but the time goes up in proportion with the number of elements. In contrast, the hash-table lookup has a somewhat higher overhead, but the speed is largely unaffected by the number of entries in the table. For an `equal' hash-table or alist, hash-tables have an even greater advantage, since the test is more expensive. Whatever you do, be sure to use the most restrictive test function possible. The style argument observes that although hash-tables and alists overlap in function, they do not do all things equally well. * Alists are good for maintaining scoped environments. They were originally invented to implement scoping in the Lisp interpreter, and are still used for this in Python. With an alist one can non-destructively change an association simply by consing a new element on the front. This is something that cannot be done with hash-tables. * Hashtables are good for maintaining a global association. The value associated with an entry can easily be changed with `setf'. With an alist, one has to go through contortions, either `rplacd''ing the cons if the entry exists, or pushing a new one if it doesn't. The side-effecting nature of hash-table operations is an advantage here. Historically, symbol property lists were often used for global name associations. Property lists provide an awkward and error-prone combination of name association and record structure. If you must use the property list, please store all the related values in a single structure under a single property, rather than using many properties. This makes access more efficient, and also adds a modicum of typing and abstraction. *Note More About Types in Python:: for information on types in CMU Common Lisp. File: cmu-user.info Node: Numbers, Prev: Object Representation, Up: Advanced Compiler Use and Efficiency Hints, Next: General Efficiency Hints Numbers ======= Numbers are interesting because numbers are one of the few Common Lisp data types that have direct support in conventional hardware. If a number can be represented in the way that the hardware expects it, then there is a big efficiency advantage. Using hardware representations is problematical in Common Lisp due to dynamic typing (where the type of a value may be unknown at compile time.) It is possible to compile code for statically typed portions of a Common Lisp program with efficiency comparable to that obtained in statically typed languages such as C, but not all Common Lisp implementations succeed. There are two main barriers to efficient numerical code in Common Lisp: * The compiler must prove that the numerical expression is in fact statically typed, and * The compiler must be able to somehow reconcile the conflicting demands of the hardware mandated number representation with the Common Lisp requirements of dynamic typing and garbage-collecting dynamic storage allocation. Because of its type inference (*Note Type Inference::) and efficiency notes (?), Python is better than conventional Common Lisp compilers at ensuring that numerical expressions are statically typed. Python also goes somewhat farther than existing compilers in the area of allowing native machine number representations in the presence of garbage collection. * Menu: * Descriptors:: * Non-Descriptor Representations:: * Variables:: * Generic Arithmetic:: * Fixnums:: * Word Integers:: * Floating Point Efficiency:: * Specialized Arrays:: * Interactions With Local Call:: * Representation of Characters:: File: cmu-user.info Node: Descriptors, Prev: Numbers, Up: Numbers, Next: Non-Descriptor Representations Descriptors ----------- Common Lisp's dynamic typing requires that it be possible to represent any value with a fixed length object, known as a DESCRIPTOR. This fixed-length requirement is implicit in features such as: * Data types (like `simple-vector') that can contain any type of object, and that can be destructively modified to contain different objects (of possibly different types.) * Functions that can be called with any type of argument, and that can be redefined at run time. In order to save space, a descriptor is invariably represented as a single word. Objects that can be directly represented in the descriptor itself are said to be IMMEDIATE. Descriptors for objects larger than one word are in reality pointers to the memory actually containing the object. Representing objects using pointers has two major disadvantages: * The memory pointed to must be allocated on the heap, so it must eventually be freed by the garbage collector. Excessive heap allocation of objects (or "consing") is inefficient in several ways. ?. * Representing an object in memory requires the compiler to emit additional instructions to read the actual value in from memory, and then to write the value back after operating on it. The introduction of garbage collection makes things even worse, since the garbage collector must be able to determine whether a descriptor is an immediate object or a pointer. This requires that a few bits in each descriptor be dedicated to the garbage collector. The loss of a few bits doesn't seem like much, but it has a major efficiency implication -- objects whose natural machine representation is a full word (integers and single-floats) cannot have an immediate representation. So the compiler is forced to use an unnatural immediate representation (such as `fixnum') or a natural pointer representation (with the attendant consing overhead.) File: cmu-user.info Node: Non-Descriptor Representations, Prev: Descriptors, Up: Numbers, Next: Variables Non-Descriptor Representations ------------------------------ From the discussion above, we can see that the standard descriptor representation has many problems, the worst being number consing. Common Lisp compilers try to avoid these descriptor efficiency problems by using NON-DESCRIPTOR representations. A compiler that uses non-descriptor representations can compile this function so that it does no number consing: (defun multby (vec n) (declare (type (simple-array single-float (*)) vec) (single-float n)) (dotimes (i (length vec)) (setf (aref vec i) (* n (aref vec i))))) If a descriptor representation were used, each iteration of the loop might cons two floats and do three times as many memory references. As its negative definition suggests, the range of possible non-descriptor representations is large. The performance improvement from non-descriptor representation depends upon both the number of types that have non-descriptor representations and the number of contexts in which the compiler is forced to use a descriptor representation. Many Common Lisp compilers support non-descriptor representations for float types such as `single-float' and `double-float' (section ?.) Python adds support for full word integers (?), characters (?) and system-area pointers (unconstrained pointers, ?.) Many Common Lisp compilers support non-descriptor representations for variables (section ?) and array elements (section ?.) Python adds support for non-descriptor arguments and return values in local call (?). File: cmu-user.info Node: Variables, Prev: Non-Descriptor Representations, Up: Numbers, Next: Generic Arithmetic Variables --------- In order to use a non-descriptor representation for a variable or expression intermediate value, the compiler must be able to prove that the value is always of a particular type having a non-descriptor representation. Type inference (*Note Type Inference::) often needs some help from user-supplied declarations. The best kind of type declaration is a variable type declaration placed at the binding point: (let ((x (car l))) (declare (single-float x)) ...) Use of `the', or of variable declarations not at the binding form is insufficient to allow non-descriptor representation of the variable -- with these declarations it is not certain that all values of the variable are of the right type. It is sometimes useful to introduce a gratuitous binding that allows the compiler to change to a non-descriptor representation, like: (etypecase x ((signed-byte 32) (let ((x x)) (declare (type (signed-byte 32) x)) ...)) ...) The declaration on the inner `x' is necessary here due to a phase ordering problem. Although the compiler will eventually prove that the outer `x' is a `(signed-byte 32)' within that `etypecase' branch, the inner `x' would have been optimized away by that time. Declaring the type makes let optimization more cautious. Note that storing a value into a global (or `special') variable always forces a descriptor representation. Wherever possible, you should operate only on local variables, binding any referenced globals to local variables at the beginning of the function, and doing any global assignments at the end. Efficiency notes signal use of inefficient representations, so programmer's needn't continuously worry about the details of representation selection (?.) File: cmu-user.info Node: Generic Arithmetic, Prev: Variables, Up: Numbers, Next: Fixnums Generic Arithmetic ------------------ In CMU Common Lisp, arithmetic operations are GENERIC. (9) (*Note Generic Arithmetic-Footnotes::) The `+' function can be passed `fixnum's, `bignum's, `ratio's, and various kinds of `float's and `complex'es, in any combination. In addition to the inherent complexity of `bignum' and `ratio' operations, there is also a lot of overhead in just figuring out which operation to do and what contagion and canonicalization rules apply. The complexity of generic arithmetic is so great that it is inconceivable to open code it. Instead, the compiler does a function call to a generic arithmetic routine, consuming many instructions before the actual computation even starts. This is ridiculous, since even Common Lisp programs do a lot of arithmetic, and the hardware is capable of doing operations on small integers and floats with a single instruction. To get acceptable efficiency, the compiler special-cases uses of generic arithmetic that are directly implemented in the hardware. In order to open code arithmetic, several constraints must be met: * All the arguments must be known to be a good type of number. * The result must be known to be a good type of number. * Any intermediate values such as the result of `(+ a b)' in the call `(+ a b c)' must be known to be a good type of number. * All the above numbers with good types must be of the SAME good type. Don't try to mix integers and floats or different float formats. The "good types" are `(signed-byte 32)', `(unsigned-byte 32)', `single-float' and `double-float'. See sections ?, ? and ? for more discussion of good numeric types. `float' is not a good type, since it might mean either `single-float' or `double-float'. `integer' is not a good type, since it might mean `bignum'. `rational' is not a good type, since it might mean `ratio'. Note however that these types are still useful in declarations, since type inference may be able to strengthen a weak declaration into a good one, when it would be at a loss if there was no declaration at all (*Note Type Inference::). The `integer' and `unsigned-byte' (or non-negative integer) types are especially useful in this regard, since they can often be strengthened to a good integer type. Arithmetic with `complex' numbers is inefficient in comparison to float and integer arithmetic. Complex numbers are always represented with a pointer descriptor (causing consing overhead), and complex arithmetic is always closed coded using the general generic arithmetic functions. But arithmetic with complex types such as: (complex float) (complex fixnum) is still faster than `bignum' or `ratio' arithmetic, since the implementation is much simpler. Note: don't use `/' to divide integers unless you want the overhead of rational arithmetic. Use `truncate' even when you know that the arguments divide evenly. You don't need to remember all the rules for how to get open-coded arithmetic, since efficiency notes will tell you when and where there is a problem -- ?. File: cmu-user.info Node: Generic Arithmetic-Footnotes, Up: Generic Arithmetic (9) As Steele notes in CLTL II, this is a generic conception of generic, and is not to be confused with the CLOS concept of a generic function. File: cmu-user.info Node: Fixnums, Prev: Generic Arithmetic, Up: Numbers, Next: Word Integers Fixnums ------- A fixnum is a "FIXed precision NUMber". In modern Common Lisp implementations, fixnums can be represented with an immediate descriptor, so operating on fixnums requires no consing or memory references. Clever choice of representations also allows some arithmetic operations to be done on fixnums using hardware supported word-integer instructions, somewhat reducing the speed penalty for using an unnatural integer representation. It is useful to distinguish the `fixnum' type from the fixnum representation of integers. In Python, there is absolutely nothing magical about the `fixnum' type in comparison to other finite integer types. `fixnum' is equivalent to (is defined with `deftype' to be) `(signed-byte 30)'. `fixnum' is simply the largest subset of integers that can be represented using an immediate fixnum descriptor. Unlike in other CMU Common Lisp compilers, it is in no way desirable to use the `fixnum' type in declarations in preference to more restrictive integer types such as `bit', `(integer -43 7)' and `(unsigned-byte 8)'. Since Python does understand these integer types, it is preferable to use the more restrictive type, as it allows better type inference (*Note Operation Specific Type Inference::.) The small, efficient fixnum is contrasted with bignum, or "BIG NUMber". This is another descriptor representation for integers, but this time a pointer representation that allows for arbitrarily large integers. Bignum operations are less efficient than fixnum operations, both because of the consing and memory reference overheads of a pointer descriptor, and also because of the inherent complexity of extended precision arithmetic. While fixnum operations can often be done with a single instruction, bignum operations are so complex that they are always done using generic arithmetic. A crucial point is that the compiler will use generic arithmetic if it can't PROVE that all the arguments, intermediate values, and results are fixnums. With bounded integer types such as `fixnum', the result type proves to be especially problematical, since these types are not closed under common arithmetic operations such as `+', `-', `*' and `/'. For example, `(1+ (the fixnum x))' does not necessarily evaluate to a `fixnum'. Bignums were added to Common Lisp to get around this problem, but they really just transform the correctness problem "if this add overflows, you will get the wrong answer" to the efficiency problem "if this add MIGHT overflow then your program will run slowly (because of generic arithmetic.)" There is just no getting around the fact that the hardware only directly supports short integers. To get the most efficient open coding, the compiler must be able to prove that the result is a good integer type. This is an argument in favor of using more restrictive integer types: `(1+ (the fixnum x))' may not always be a `fixnum', but `(1+ (the (unsigned-byte 8) x))' always is. Of course, you can also assert the result type by putting in lots of `the' declarations and then compiling with `safety' `0'. File: cmu-user.info Node: Word Integers, Prev: Fixnums, Up: Numbers, Next: Floating Point Efficiency Word Integers ------------- Python is unique in its efficient implementation of arithmetic on full-word integers through non-descriptor representations and open coding. Arithmetic on any subtype of these types: (signed-byte 32) (unsigned-byte 32) is reasonably efficient, although subtypes of `fixnum' remain somewhat more efficient. If a word integer must be represented as a descriptor, then the `bignum' representation is used, with its associated consing overhead. The support for word integers in no way changes the language semantics, it just makes arithmetic on small bignums vastly more efficient. It is fine to do arithmetic operations with mixed `fixnum' and word integer operands; just declare the most specific integer type you can, and let the compiler decide what representation to use. In fact, to most users, the greatest advantage of word integer arithmetic is that it effectively provides a few guard bits on the fixnum representation. If there are missing assertions on intermediate values in a fixnum expression, the intermediate results can usually be proved to fit in a word. After the whole expression is evaluated, there will often be a fixnum assertion on the final result, allowing creation of a fixnum result without even checking for overflow. The remarks in section *Note Fixnums:: about fixnum result type also apply to word integers; you must be careful to give the compiler enough information to prove that the result is still a word integer. This time, though, when we blow out of word integers we land in into generic bignum arithmetic, which is much worse than sleazing from `fixnum's to word integers. Note that mixing `(unsigned-byte 32)' arguments with arguments of any signed type (such as `fixnum') is a no-no, since the result might not be unsigned. File: cmu-user.info Node: Floating Point Efficiency, Prev: Word Integers, Up: Numbers, Next: Specialized Arrays Floating Point Efficiency ------------------------- Arithmetic on objects of type `single-float' and `double-float' is efficiently implemented using non-descriptor representations and open coding. As for integer arithmetic, the arguments must be known to be of the same float type. Unlike for integer arithmetic, the results and intermediate values usually take care of themselves due to the rules of float contagion, i.e. `(1+ (the single-float x))' is always a `single-float'. Although they are not specially implemented, `short-float' and `long-float' are also acceptable in declarations, since they are synonyms for the `single-float' and `double-float' types, respectively. It is harmless to use list-style float type specifiers such as `(single-float 0.0 1.0)', but Python currently makes little use of bounds on float types. When a float must be represented as a descriptor, a pointer representation is used, creating consing overhead. For this reason, you should try to avoid situations (such as full call and non-specialized data structures) that force a descriptor representation. See sections ? and ?. *Note Floats:: for information on the extensions to support IEEE floating point. File: cmu-user.info Node: Specialized Arrays, Prev: Floating Point Efficiency, Up: Numbers, Next: Interactions With Local Call Specialized Arrays ------------------ CMU Common Lisp supports specialized array element types through the :element-type argument to `make-array'. When an array has a specialized element type, only elements of that type can be stored in the array. From this restriction comes two major efficiency advantages: * A specialized array can save space by packing multiple elements into a single word. For example, a `base-char' array can have 4 elements per word, and a `bit' array can have 32. This space-efficient representation is possible because it is not necessary to separately indicate the type of each element. * The elements in a specialized array can be given the same non-descriptor representation as the one used in registers and on the stack, eliminating the need for representation conversions when reading and writing array elements. For objects with pointer descriptor representations (such as floats and word integers) there is also a substantial consing reduction because it is not necessary to allocate a new object every time an array element is modified. These are the specialized element types currently supported: bit (unsigned-byte 2) (unsigned-byte 4) (unsigned-byte 8) (unsigned-byte 16) (unsigned-byte 32) base-character single-float double-float Although a `simple-vector' can hold any type of object, true should still be considered a specialized array type, since arrays with element type true are specialized to hold descriptors. When using non-descriptor representations, it is particularly important to make sure that array accesses are open-coded, since in addition to the generic operation overhead, efficiency is lost when the array element is converted to a descriptor so that it can be passed to (or from) the generic access routine. You can detect inefficient array accesses by enabling efficiency notes, ?. *Note Arrays::. File: cmu-user.info Node: Interactions With Local Call, Prev: Specialized Arrays, Up: Numbers, Next: Representation of Characters Interactions With Local Call ---------------------------- Local call has many advantages (*Note Local Call::); one relevant to our discussion here is that local call extends the usefulness of non-descriptor representations. If the compiler knows from the argument type that an argument has a non-descriptor representation, then the argument will be passed in that representation. The easiest way to ensure that the argument type is known at compile time is to always declare the argument type in the called function, like: (defun 2+f (x) (declare (single-float x)) (+ x 2.0)) The advantages of passing arguments and return values in a non-descriptor representation are the same as for non-descriptor representations in general: reduced consing and memory access (*Note Non-Descriptor Representations::.) This extends the applicative programming styles discussed in section *Note Local Call:: to numeric code. Also, if source files are kept reasonably small, block compilation can be used to reduce number consing to a minimum. Note that non-descriptor return values can only be used with the known return convention (section *Note Return Values::.) If the compiler can't prove that a function always returns the same number of values, then it must use the unknown values return convention, which requires a descriptor representation. Pay attention to the known return efficiency notes to avoid number consing. File: cmu-user.info Node: Representation of Characters, Prev: Interactions With Local Call, Up: Numbers Representation of Characters ---------------------------- Python also uses a non-descriptor representation for characters when convenient. This improves the efficiency of string manipulation, but is otherwise pretty invisible; characters have an immediate descriptor representation, so there is not a great penalty for converting a character to a descriptor. Nonetheless, it may sometimes be helpful to declare character-valued variables as `base-character'. File: cmu-user.info Node: General Efficiency Hints, Prev: Numbers, Up: Advanced Compiler Use and Efficiency Hints, Next: Efficiency Notes General Efficiency Hints ======================== This section is a summary of various implementation costs and ways to get around them. These hints are relatively unrelated to the use of the Python compiler, and probably also apply to most other Common Lisp implementations. In each section, there are references to related in-depth discussion. * Menu: * Compile Your Code:: * Avoid Unnecessary Consing:: * Complex Argument Syntax:: * Mapping and Iteration:: * Trace Files and Disassembly:: File: cmu-user.info Node: Compile Your Code, Prev: General Efficiency Hints, Up: General Efficiency Hints, Next: Avoid Unnecessary Consing Compile Your Code ----------------- At this point, the advantages of compiling code relative to running it interpreted probably need not be emphasized too much, but remember that in CMU Common Lisp, compiled code typically runs hundreds of times faster than interpreted code. Also, compiled (`fasl') files load significantly faster than source files, so it is worthwhile compiling files which are loaded many times, even if the speed of the functions in the file is unimportant. Even disregarding the efficiency advantages, compiled code is as good or better than interpreted code. Compiled code can be debugged at the source level (see chapter *Note The Debugger::), and compiled code does more error checking. For these reasons, the interpreter should be regarded mainly as an interactive command interpreter, rather than as a programming language implementation. Do not be concerned about the performance of your program until you see its speed compiled. Some techniques that make compiled code run faster make interpreted code run slower. File: cmu-user.info Node: Avoid Unnecessary Consing, Prev: Compile Your Code, Up: General Efficiency Hints, Next: Complex Argument Syntax Avoid Unnecessary Consing ------------------------- Consing is another name for allocation of storage, as done by the `cons' function (hence its name.) `cons' is by no means the only function which conses -- so does `make-array' and many other functions. Arithmetic and function call can also have hidden consing overheads. Consing hurts performance in the following ways: * Consing reduces memory access locality, increasing paging activity. * Consing takes time just like anything else. * Any space allocated eventually needs to be reclaimed, either by garbage collection or by starting a new `lisp' process. Consing is not undiluted evil, since programs do things other than consing, and appropriate consing can speed up the real work. It would certainly save time to allocate a vector of intermediate results that are reused hundreds of times. Also, if it is necessary to copy a large data structure many times, it may be more efficient to update the data structure non-destructively; this somewhat increases update overhead, but makes copying trivial. Note that the remarks in section *Note Writing Efficient Code:: about the importance of separating tuning from coding also apply to consing overhead. The majority of consing will be done by a small portion of the program. The consing hot spots are even less predictable than the CPU hot spots, so don't waste time and create bugs by doing unnecessary consing optimization. During initial coding, avoid unnecessary side-effects and cons where it is convenient. If profiling reveals a consing problem, THEN go back and fix the hot spots. *Note Non-Descriptor Representations:: for a discussion of how to avoid number consing in Python. File: cmu-user.info Node: Complex Argument Syntax, Prev: Avoid Unnecessary Consing, Up: General Efficiency Hints, Next: Mapping and Iteration Complex Argument Syntax ----------------------- Common Lisp has very powerful argument passing mechanisms. Unfortunately, two of the most powerful mechanisms, rest arguments and keyword arguments, have a significant performance penalty: * With keyword arguments, the called function has to parse the supplied keywords by iterating over them and checking them against the desired keywords. * With rest arguments, the function must cons a list to hold the arguments. If a function is called many times or with many arguments, large amounts of memory will be allocated. Although rest argument consing is worse than keyword parsing, neither problem is serious unless thousands of calls are made to such a function. The use of keyword arguments is strongly encouraged in functions with many arguments or with interfaces that are likely to be extended, and rest arguments are often natural in user interface functions. Optional arguments have some efficiency advantage over keyword arguments, but their syntactic clumsiness and lack of extensibility has caused many CMU Common Lisp programmers to abandon use of optionals except in functions that have obviously simple and immutable interfaces (such as `subseq'), or in functions that are only called in a few places. When defining an interface function to be used by other programmers or users, use of only required and keyword arguments is recommended. Parsing of `defmacro' keyword and rest arguments is done at compile time, so a macro can be used to provide a convenient syntax with an efficient implementation. If the macro-expanded form contains no keyword or rest arguments, then it is perfectly acceptable in inner loops. Keyword argument parsing overhead can also be avoided by use of inline expansion (*Note Inline Expansion::) and block compilation (section *Note Block Compilation::.) Note: the compiler open-codes most heavily used system functions which have keyword or rest arguments, so that no run-time overhead is involved. File: cmu-user.info Node: Mapping and Iteration, Prev: Complex Argument Syntax, Up: General Efficiency Hints, Next: Trace Files and Disassembly Mapping and Iteration --------------------- One of the traditional Common Lisp programming styles is a highly applicative one, involving the use of mapping functions and many lists to store intermediate results. To compute the sum of the square-roots of a list of numbers, one might say: (apply #'+ (mapcar #'sqrt list-of-numbers)) This programming style is clear and elegant, but unfortunately results in slow code. There are two reasons why: * The creation of lists of intermediate results causes much consing (see *Note Avoid Unnecessary Consing::). * Each level of application requires another scan down the list. Thus, disregarding other effects, the above code would probably take twice as long as a straightforward iterative version. An example of an iterative version of the same code: (do ((num list-of-numbers (cdr num)) (sum 0 (+ (sqrt (car num)) sum))) ((null num) sum)) See sections *Note Variable Type Inference:: and *Note Let Optimization:: for a discussion of the interactions of iteration constructs with type inference and variable optimization. Also, section *Note Local Tail Recursion:: discusses an applicative style of iteration. File: cmu-user.info Node: Trace Files and Disassembly, Prev: Mapping and Iteration, Up: General Efficiency Hints Trace Files and Disassembly --------------------------- In order to write efficient code, you need to know the relative costs of different operations. The main reason why writing efficient Common Lisp code is difficult is that there are so many operations, and the costs of these operations vary in obscure context-dependent ways. Although efficiency notes point out some problem areas, the only way to ensure generation of the best code is to look at the assembly code output. The `disassemble' function is a convenient way to get the assembly code for a function, but it can be very difficult to interpret, since the correspondence with the original source code is weak. A better (but more awkward) option is to use the :trace-file argument to `compile-file' to generate a trace file. A trace file is a dump of the compiler's internal representations, including annotated assembly code. Each component in the program gets three pages in the trace file (separated by "`^L'"): * The implicit-continuation (or IR1) representation of the optimized source. This is a dump of the flow graph representation used for "source level" optimizations. As you will quickly notice, it is not really very close to the source. This representation is not very useful to even sophisticated users. * The Virtual Machine (VM, or IR2) representation of the program. This dump represents the generated code as sequences of "Virtual OPerations" (VOPs.) This representation is intermediate between the source and the assembly code -- each VOP corresponds fairly directly to some primitive function or construct, but a given VOP also has a fairly predictable instruction sequence. An operation (such as `+') may have multiple implementations with different cost and applicability. The choice of a particular VOP such as `+/fixnum' or `+/single-float' represents this choice of implementation. Once you are familiar with it, the VM representation is probably the most useful for determining what implementation has been used. * An assembly listing, annotated with the VOP responsible for generating the instructions. This listing is useful for figuring out what a VOP does and how it is implemented in a particular context, but its large size makes it more difficult to read. Note that trace file generation takes much space and time, since the trace file is tens of times larger than the source file. To avoid huge confusing trace files and much wasted time, it is best to separate the critical program portion into its own file and then generate the trace file from this small file. File: cmu-user.info Node: Efficiency Notes, Prev: General Efficiency Hints, Up: Advanced Compiler Use and Efficiency Hints, Next: Profiling Efficiency Notes ================ Efficiency notes are messages that warn the user that the compiler has chosen a relatively inefficient implementation for some operation. Usually an efficiency note reflects the compiler's desire for more type information. If the type of the values concerned is known to the programmer, then additional declarations can be used to get a more efficient implementation. Efficiency notes are controlled by the `extensions:inhibit-warnings' optimization quality (*Note The Optimize Declaration::.) When `speed' is greater than `extensions:inhibit-warnings', efficiency notes are enabled. Note that this implicitly enables efficiency notes whenever `speed' is increased from its default of `1'. Consider this program with an obscure missing declaration: (defun eff-note (x y z) (declare (fixnum x y z)) (the fixnum (+ x y z))) If compiled with `(speed 3) (safety 0)', this note is given: In: DEFUN EFF-NOTE (+ X Y Z) ==> (+ (+ X Y) Z) Note: Forced to do inline (signed-byte 32) arithmetic (cost 3). Unable to do inline fixnum arithmetic (cost 2) because: The first argument is a (INTEGER -1073741824 1073741822), not a FIXNUM. This efficiency note tells us that the result of the intermediate computation `(+ x y)' is not known to be a `fixnum', so the addition of the intermediate sum to `z' must be done less efficiently. This can be fixed by changing the definition of `eff-note': (defun eff-note (x y z) (declare (fixnum x y z)) (the fixnum (+ (the fixnum (+ x y)) z))) * Menu: * Type Uncertainty:: * Efficiency Notes and Type Checking:: * Representation Efficiency Notes:: * Verbosity Control:: File: cmu-user.info Node: Type Uncertainty, Prev: Efficiency Notes, Up: Efficiency Notes, Next: Efficiency Notes and Type Checking Type Uncertainty ---------------- The main cause of inefficiency is the compiler's lack of adequate information about the types of function argument and result values. Many important operations (such as arithmetic) have an inefficient general (generic) case, but have efficient implementations that can usually be used if there is sufficient argument type information. Type efficiency notes are given when a value's type is uncertain. There is an important distinction between values that are not known to be of a good type (uncertain) and values that are known not to be of a good type. Efficiency notes are given mainly for the first case (uncertain types.) If it is clear to the compiler that that there is not an efficient implementation for a particular function call, then an efficiency note will only be given if the `extensions:inhibit-warnings' optimization quality is `0' (*Note The Optimize Declaration::.) In other words, the default efficiency notes only suggest that you add declarations, not that you change the semantics of your program so that an efficient implementation will apply. For example, compilation of this form will not give an efficiency note: (elt (the list l) i) even though a vector access is more efficient than indexing a list. File: cmu-user.info Node: Efficiency Notes and Type Checking, Prev: Type Uncertainty, Up: Efficiency Notes, Next: Representation Efficiency Notes Efficiency Notes and Type Checking ---------------------------------- It is important that the `eff-note' example above used `(safety 0)'. When type checking is enabled, you may get apparently spurious efficiency notes. With `(safety 1)', the note has this extra line on the end: The result is a (INTEGER -1610612736 1610612733), not a FIXNUM. This seems strange, since there is a `the' declaration on the result of that second addition. In fact, the inefficiency is real, and is a consequence of Python's treating declarations as assertions to be verified. The compiler can't assume that the result type declaration is true -- it must generate the result and then test whether it is of the appropriate type. In practice, this means that when you are tuning a program to run without type checks, you should work from the efficiency notes generated by unsafe compilation. If you want code to run efficiently with type checking, then you should pay attention to all the efficiency notes that you get during safe compilation. Since user supplied output type assertions (e.g., from `the') are disregarded when selecting operation implementations for safe code, you must somehow give the compiler information that allows it to prove that the result truly must be of a good type. In our example, it could be done by constraining the argument types more: (defun eff-note (x y z) (declare (type (unsigned-byte 18) x y z)) (+ x y z)) Of course, this declaration is acceptable only if the arguments to `eff-note' always ARE `(unsigned-byte 18)' integers.